A percolate table is a special table that stores queries rather than documents. It is used for prospective searches, or "search in reverse."
- To learn more about performing a search query against a percolate table, see the section Percolate query.
- To learn how to prepare a table for searching, see the section Adding rules to a percolate table.
The schema of a percolate table is fixed and contains the following fields:
Field | Description |
---|---|
ID | An unsigned 64-bit integer with auto-increment functionality. It can be omitted when adding a PQ rule, as described in add a PQ rule |
Query | Full-text query of the rule, which can be thought of as the value of MATCH clause or JSON /search. If per field operators are used inside the query, the full-text fields need to be declared in the percolate table configuration. If the stored query is only for attribute filtering (without full-text querying), the query value can be empty or omitted. The value of this field should correspond to the expected document schema, which is specified when creating the percolate table. |
Filters | Optional. Filters are an optional string containing attribute filters and/or expressions, defined the same way as in the WHERE clause or JSON filtering. The value of this field should correspond to the expected document schema, which is specified when creating the percolate table. |
Tags | Optional. Tags represent a list of string labels separated by commas that can be used for filtering/deleting PQ rules. The tags can also be returned along with matching documents when performing a Percolate query |
Note that you do not need to add the above fields when creating a percolate table.
What you need to keep in mind when creating a new percolate table is to specify the expected schema of a document, which will be checked against the rules you will add later. This is done in the same way as for any other local table.
- SQL
- JSON
- PHP
- Python
- Python-asyncio
- javascript
- java
- C#
- Rust
- typescript
- go
- CONFIG
CREATE TABLE products(title text, meta json) type='pq';
POST /cli -d "CREATE TABLE products(title text, meta json) type='pq'"
$index = [
'table' => 'products',
'body' => [
'columns' => [
'title' => ['type' => 'text'],
'meta' => ['type' => 'json']
],
'settings' => [
'type' => 'pq'
]
]
];
$client->indices()->create($index);
utilsApi.sql('CREATE TABLE products(title text, meta json) type=\'pq\'')
await utilsApi.sql('CREATE TABLE products(title text, meta json) type=\'pq\'')
res = await utilsApi.sql('CREATE TABLE products(title text, meta json) type=\'pq\'');
utilsApi.sql("CREATE TABLE products(title text, meta json) type='pq'");
utilsApi.Sql("CREATE TABLE products(title text, meta json) type='pq'");
utils_api.sql("CREATE TABLE products(title text, meta json) type='pq'", Some(true)).await;
res = await utilsApi.sql("CREATE TABLE products(title text, meta json) type='pq'");
apiClient.UtilsAPI.Sql(context.Background()).Body("CREATE TABLE products(title text, meta json) type='pq'").Execute()
table products {
type = percolate
path = tbl_pq
rt_field = title
rt_attr_json = meta
}
Query OK, 0 rows affected (0.00 sec)
{
"total":0,
"error":"",
"warning":""
}
Array(
[total] => 0
[error] =>
[warning] =>
)
A Template Table is a special type of table in Manticore that doesn't store any data and doesn't create any files on your disk. Despite this, it can have the same NLP settings as a plain or real-time table. Template tables can be used for the following purposes:
- As a template to inherit settings in the Plain mode, simplifying your Manticore configuration file.
- Keyword generation with the help of the CALL KEYWORDS command.
- Highlighting an arbitrary string using the CALL SNIPPETS command.
- CONFIG
table template {
type = template
morphology = stem_en
wordforms = wordforms.txt
exceptions = exceptions.txt
stopwords = stopwords.txt
}
⪢ NLP and tokenization
Manticore doesn't store text exactly as it is for full-text searching. Instead, it breaks the text into words (called tokens) and builds several internal structures to enable fast full-text searches. These structures include a dictionary that helps quickly check if a word exists in the index. Other structures track which documents and fields contain the word, and even where exactly in the field it appears. These are all used during a search to find relevant results.
The process of splitting and handling text like this is called tokenization. Tokenization happens both when adding data to the index and when running a search. It works at both the character and word level.
At the character level, only certain characters are allowed. This is controlled by the charset_table
. Any other characters are replaced with a space (which is treated as a word separator). The charset_table
also supports things like turning characters into lowercase or replacing one character with another. It can also define characters to be ignored, blended, or treated as a phrase boundary.
At the word level, the engine uses the min_word_len
setting to decide the minimum word length (in characters) that should be indexed.
Manticore also supports matching words with different forms. For example, to treat "car" and "cars" as the same word, you can use morphology processors.
If you want different words to be treated as the same—for example, "USA" and "United States" — you can define them using the word forms feature.
Very common words (like "the", "and", "is") can slow down searches and increase index size. You can filter them out using stop words. This can make searches faster and the index smaller.
A more advanced filtering method is bigrams, which creates special tokens by combining a common word with an uncommon one. This can significantly speed up phrase searches when common words are involved.
If you're indexing HTML, it's usually best not to include the HTML tags in the index, since they add a lot of unnecessary content. You can use HTML stripping to remove the tags, but still index certain tag attributes or skip specific elements entirely.
Keep in mind that Manticore has a maximum token length of 42 characters. Any word longer than this will be truncated. This limit applies during both indexing and searching, so it's important to ensure your data and queries account for it.